{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Learning to recognize handwritten digits using a neural network"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "We have now reached the point where we can tackle a very interesting task: applying the knowledge we have gained with machine learning in general, and `Flux.jl` in particular, to create a neural network that can recognize handwritten digits! The data are from a data set called MNIST, which has become a classic in the machine learning world.\n\n[We could also try to apply the techniques to the original images of fruit instead. However, the fruit images are much larger than the MNIST images, which makes the learning a suitable neural network too slow.]"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Data munging"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "As we know, the first difficulty with any new data set is locating it, understanding what format it is stored in, reading it in and decoding it into a useful data structure in Julia.\n\nThe original MNIST data is available [here](http://yann.lecun.com/exdb/mnist); see also the [Wikipedia page](https://en.wikipedia.org/wiki/MNIST_database). However, the format that the data is stored in is rather obscure.\n\nFortunately, various packages in Julia provide nicer interfaces to access it. We will use the one provided by `Flux.jl`.\n\nThe data are images of handwritten digits, and the corresponding labels that were determined by hand (i.e. by humans). Our job will be to get the computer to **learn** to recognize digits by learning, as usual, the function that relates the input and output data."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Loading and examining the data"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "First we load the required packages:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "using Flux, Flux.Data.MNIST"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now we read in the data:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "labels = MNIST.labels();\nimages = MNIST.images();  # the semi-colon (`;`) here is important: it prevents Julia from displaying the object"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 1\n\nExamine the `labels` data. Then examine the first few images. *Do not try to view the whole of the `images` object!* Try to drill down to discover how the data is laid out."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 2\n\nConvert the first image to a matrix of `Float64`."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Munging the data"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "In the previous notebooks, we arranged the input data for Flux as a `Vector` of `Vector`s.\nNow we will use an alternative arrangement, as a matrix, since that allows `Flux` to use matrix operations, which are more efficient.\n\nThe column $i$ of the matrix is a vector consisting of the $i$th data point $\\mathbf{x}^{(i)}$.  Similarly, the desired outputs are given as a matrix, with the $i$th column being the desired output $\\mathbf{y}^{(i)}$."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 3\n\nAn image is a matrix of colours, but now we need a vector instead. To do so, we just arrange all of the elements of the matrix in a certain way into a single list; fortunately, Julia already provides the function `vec` to do so!\n\n1. Which order does `vec` use? [This reflects the underlying way in which the matrix is stored in memory.]\n\n2. How can you convert an image into a `Vector` of `Float64`?\n\n3. Define a variable $n$ that is the length of these vectors."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 4\nMake a function `rewrite` that accepts a range and converts that range of images to floating-point vectors and stacks them horizontally using `hcat` and the \"splat\" operator `...`. \n\nWe also want a matrix of one-hot vectors. `Flux` provides a function `onehotbatch` to do this (you will need to import it). It works like `onehot`, but takes in a vector of labels and outputs a matrix `Y`.\n\nReturn the pair `(X, Y)`."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Setting up the neural network"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now we must set up a neural network. Since the data is complicated, we may expect to need several layers.\nBut we can start with a single layer.\n\n- The network will take as inputs the vectors $\\mathbf{x}^{(i)}$, so the input layer has $n$ nodes. \n\n- The output will be a one-hot vector encoding the digit from 1 to 9 or 0 that is desired. There are 10 possible categories, so we need an output layer of size 10. \n\nIt is then our task as neural network designers to insert layers between these input and output layers, whose weights will be tuned during the learning process. *This is an art, not a science*! But major advances have come from finding a good structure for the network."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Softmax"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "We will make a network with a single layer; let's choose each neuron in the layer to use the `relu` activation function. \nThe output `relu` can be arbitrarily large, but in the end we will wish to compare the network's output with one-hot vectors, i.e. values between $0$ and $1$.\n\nIn order to make this work, we will thus use an extra function at the end that takes in a vector of arbitrary real numbers and maps it (\"squashes it down\") to a vector of numbers between $0$ and $1$.\n\nThe most used function with this property is $\\mathrm{softmax}$. Firstly we take the exponential of each input variable to make them positive. Then we divide by the sum to make sure they lie between $0$ and $1$."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "$$\\mathrm{softmax}(\\mathbf{x})_i := \\frac{\\exp (x_i)}{\\sum_j \\exp(x_j)}$$"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "Note that here we have written the result for the $i$th component of the function $\\mathbf{R}^n \\to \\mathbf{R}^n$. Note also that the function returns a vector of numbers that are positive, and whose components sum to $1$. Thus, in fact, they can be thought of as probabilities.\n\nIn the neural network context, using a `softmax` after the final layer thus allows us to interpret the outputs as probabilities, in our case the probability that the network assigns that a given image represents each possible output value ($0$-$9$)!"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 5\n\nMake a neural network with one single layer, using the function $\\sigma$, and a final `softmax`."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Training"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "As we know, **training** consists of iteratively adjusting the model's parameters to decrease the `loss` function. Which parameters need to be adjusted? All of them!\n\nSince the `loss` function contains a call to the `model` function, calling `back!` on the result of the loss function updates the information about the gradient of the loss function with respect to *every node in the network!*:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "l = loss(X, Y)\n\nFlux.Tracker.back!(l)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "This is what is going on inside the `train!` function. \nIn fact, `train!(loss, data, opt)` iterates over each object in `data` and runs this function.\nFor this reason, `data` must consist of an iterable object that returns pairs `(X, Y)` at each step."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "The simplest possibility is"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "data = ((X, Y), )  # one-element tuple"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "Alternatively, we can make one call to the `train!` function iterate over several copies of `data`, using `repeated`. This is an **iterator**; it does not copy the data 100 times, which would be very wasteful; it just gives an object that repeatedly loops over the same data:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "dataset = Base.Iterators.repeated((X, Y), 100)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 6\n\nTrain the model on a subset of $N$ images with $N = 5000$."
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "N = 5_000\nX, Y = rewrite(1:N)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "The function `loss` evaluated on the matrices gives the overall loss:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "loss(X, Y)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "@time Flux.train!(loss, data, opt)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "@time Flux.train!(loss, dataset, opt)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "This is (approximately) equivalent to just doing a `for` loop to run the previous `train!` command 100 times."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Using callbacks"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "The `train!` function can take an optional keyword argument, `cb` (short for \"*c*all*b*ack\"). A callback function is a function that you provide as an argument to a function `f`, which \"calls back\" your function every so often.\n\nThis provides the possibility to provide a function that is called at each step or every so often during the training process.\nA common use case is to provide a visual trace of the training process by printing out the current value of the `loss` function:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "callback() = @show(loss(X, Y))\n\nFlux.train!(loss, data, opt; cb = callback)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "Flux.train!(loss, dataset, opt; cb = callback)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "However, it is expensive to calculate the complete `loss` function and it is not necessary to output it every step. So `Flux` also provides a function `throttle`, that provides a mechanism to call a given function at most once every certain number of seconds:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "Flux.train!(loss, dataset, opt; cb = Flux.throttle(callback, 1))"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "for i in 1:100\n    Flux.train!(loss, dataset, opt; cb = Flux.throttle(callback, 1))\nend"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Testing phase"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "We now have trained a model, i.e. we have found the parameters `W` and `b` for the network layer(s). In order to **test** if the learning procedure was really successful, we check how well the resulting trained network performs when we test it with images that the network has not yet seen! \n\nOften, a dataset is split up into \"training data\" and \"test (or validation) data\" for this purpose, and indeed the MNIST data set has a separate pool of training data. We can instead use the images that we have not included in our reduced training process."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 7\n\nTake the next 100 images after those that were used for training. How well does it do?"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "X_test, Y_test = rewrite(N+1:N+100)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "loss(X_test, Y_test)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "display(images[N+1])\nlabels[N+1]"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "[model(X_test[:,1]) Y_test[:,1]]"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "loss(X_test[:,1], Y_test[:,1])"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "loss(X_test, Y_test)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 8\n\nUse the `indmax` function to write a function `prediction` that reports which digit `model` predicts, as the index with the maximum weight."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 9\n\nCount the number of correct predictions over the whole data set, and hence the percentage of images that are correctly predicted. [This percentage is what is used to compare different machine learning techniques.]"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Improving the prediction"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "So far we have used a single layer. In order to improve the prediction, we probably need to use more layers."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 10\n\nIntroduce an intermediate, hidden layer. Does it give a better prediction?"
      ],
      "metadata": {}
    }
  ],
  "nbformat_minor": 2,
  "metadata": {
    "language_info": {
      "file_extension": ".jl",
      "mimetype": "application/julia",
      "name": "julia",
      "version": "0.6.2"
    },
    "kernelspec": {
      "name": "julia-0.6",
      "display_name": "Julia 0.6.2",
      "language": "julia"
    }
  },
  "nbformat": 4
}
